Synthesizing Scenario-based Dataset for User Behavior Pattern Mining
نویسندگان
چکیده
User behavior pattern mining has drawn great attention in business and security areas. Realistic and accurate datasets are required for evaluating various user behavior pattern mining approaches, their implementations and optimization results. Synthetic datasets are crucial due to restricted access to production datasets, security and privacy issues, meeting specific needs of consumers, or the high costs of real datasets. This paper presents a synthetic dataset generator that effectively assists data scientists and analysts in designing scenario-driven datasets with embedded user behavior patterns, and visually analyzing the quality of the generated datasets. We developed an interactive data exploration environment to such a design-generate-visualize-analyze-optimize process. An abstract representation of the real-world user behavior pattern is proposed, which allows data analysts to create datasets with both intended and random patterns injected. Dataset generation is controlled by both data statistics (e.g., data size, and attribute distribution) and scenario-based user behavior patterns (e.g., association pattern, sequential pattern and time constraint). A prototype toolkit has been developed to synthesize and analyze the datasets in different application domains. Keywords-behavior pattern; synthetic dataset generation; data mining; visualization; sequential pattern mining; clustering
منابع مشابه
Cross-layer Packet-dependant OFDM Scheduling Based on Proportional Fairness
This paper assumes each user has more than one queue, derives a new packet-dependant proportional fairness power allocation pattern based on the sum of weight capacity and the packet’s priority in users’ queues, and proposes 4 new cross-layer packet-dependant OFDM scheduling schemes based on proportional fairness for heterogeneous classes of traffic. Scenario 1, scenario 2 and scenario 3 lead r...
متن کاملMining Characteristic Patterns to Identify Web Users
Many web applications and analyses require user identification when direct identification is not available. However, identifying users based on existing attributes, e.g. client IP, in tracking data is often misleading. Instead, because identifying users based on their intrinsic behavior patterns is more effective, we propose using characteristic patterns to capture distinctive user behavior. A ...
متن کاملA social recommender system based on matrix factorization considering dynamics of user preferences
With the expansion of social networks, the use of recommender systems in these networks has attracted considerable attention. Recommender systems have become an important tool for alleviating the information that overload problem of users by providing personalized recommendations to a user who might like based on past preferences or observed behavior about one or various items. In these systems...
متن کاملUse of Semantic Similarity and Web Usage Mining to Alleviate the Drawbacks of User-Based Collaborative Filtering Recommender Systems
One of the most famous methods for recommendation is user-based Collaborative Filtering (CF). This system compares active user’s items rating with historical rating records of other users to find similar users and recommending items which seems interesting to these similar users and have not been rated by the active user. As a way of computing recommendations, the ultimate goal of the user-ba...
متن کاملMulti-Cluster Based Temporal Mobile Sequential Pattern Mining Using Heuristic Search
An enhanced mobile sequential pattern mining using heuristic search technique is explored to predict mobile user’s behavior effectively. By analyzing the movement of mobile users with respect to time, location and service request, one can contend that users in different user groups may have different mobile transaction behavior. Similar transaction behavior in a set is grouped by applying heuri...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015